Parameter Recovery Analysis

Foundational Report 4

foundations
validation
m_0

Validation that model parameters can be accurately recovered from simulated data, demonstrating model identifiability.

Author
Published

February 15, 2026

0.1 Introduction

Having examined the prior predictive distribution (Report 3), we now ask: can we recover the true parameters when we simulate data from our model with known parameters? This is the fundamental question of parameter recovery analysis.

Parameter recovery is a critical validation step for any Bayesian model:

  1. Identifiability: Can the model distinguish between different parameter configurations?
  2. Estimation accuracy: Are posterior means close to true values?
  3. Uncertainty calibration: Do 90% credible intervals contain the true value ~90% of the time?
  4. Precision: How narrow are the credible intervals?
WarningPreview: A Key Finding

This report reveals an asymmetry in parameter recovery: while α (sensitivity) is well-identified, the pair (β, δ) that determines the expected utility function shows weaker identification—posteriors concentrate more slowly relative to α. As shown formally in Report 5, this reflects a structural identification challenge where β and δ interact multiplicatively in the expected utility, causing uncertainty in one to propagate to the other. Importantly, this does not prevent good recovery of α, which remains our primary parameter of interest.

NoteThe Recovery Paradigm

Parameter recovery follows a simple logic:

  1. Simulate: Generate data from m_0_sim.stan with known (“true”) parameter values
  2. Estimate: Fit m_0.stan to the simulated data
  3. Compare: Assess how well posterior estimates match the true values
  4. Repeat: Do this many times to characterize recovery performance

If the model is well-specified and identifiable, the posterior should concentrate around the true values.

0.2 Study Design

We use the same study design as in Report 3 to maintain consistency across our foundational analyses:

Show code
# Study design configuration (same as Report 3)
config = {
    "M": 25,                    # Number of decision problems
    "K": 3,                     # Number of consequences  
    "D": 5,                     # Feature dimensions
    "R": 15,                    # Number of distinct alternatives
    "min_alts_per_problem": 2,  # Minimum alternatives per problem
    "max_alts_per_problem": 5,  # Maximum alternatives per problem
    "feature_dist": "normal",   # Feature distribution
    "feature_params": {"loc": 0, "scale": 1}
}

print(f"Study Design Configuration:")
print(f"  Decision problems (M): {config['M']}")
print(f"  Consequences (K): {config['K']}")
print(f"  Feature dimensions (D): {config['D']}")
print(f"  Distinct alternatives (R): {config['R']}")
print(f"  Alternatives per problem: {config['min_alts_per_problem']}-{config['max_alts_per_problem']}")
Study Design Configuration:
  Decision problems (M): 25
  Consequences (K): 3
  Feature dimensions (D): 5
  Distinct alternatives (R): 15
  Alternatives per problem: 2-5
Show code
from utils.study_design import StudyDesign

# Create and generate the study design
study = StudyDesign(
    M=config["M"],
    K=config["K"],
    D=config["D"],
    R=config["R"],
    min_alts_per_problem=config["min_alts_per_problem"],
    max_alts_per_problem=config["max_alts_per_problem"],
    feature_dist=config["feature_dist"],
    feature_params=config["feature_params"],
    design_name="parameter_recovery"
)
study.generate()

Generated Study Design:
  Total alternatives across problems: 87
  Alternatives per problem: min=2, max=5, mean=3.5

0.3 Parameter Recovery Analysis

We use the ParameterRecovery class to systematically evaluate how well parameters can be recovered. This class:

  1. Uses m_0_sim.stan to generate data with known parameters drawn from the prior
  2. Fits m_0.stan to each simulated dataset
  3. Compares posterior estimates to true values across many iterations
Show code
from analysis.parameter_recovery import ParameterRecovery
import tempfile

# Create output directory for this analysis
output_dir = tempfile.mkdtemp(prefix="param_recovery_")

# Initialize parameter recovery analysis
recovery = ParameterRecovery(
    inference_model_path=None,  # Uses default m_0.stan
    sim_model_path=None,        # Uses default m_0_sim.stan
    study_design=study,
    output_dir=output_dir,
    n_mcmc_samples=1000,        # Samples per chain
    n_mcmc_chains=4,            # Number of chains
    n_iterations=50             # Number of simulation-recovery iterations
)

# Run the analysis
true_params_list, posterior_summaries = recovery.run()
Parameter Recovery Analysis Complete:
  Successful iterations: 50
  Parameters recovered per iteration:
    - α (sensitivity): 1 parameter
    - β (feature weights): 3 × 5 = 15 parameters
    - δ (utility increments): 2 parameters

0.4 Recovery Metrics

We evaluate parameter recovery using four key metrics:

Metric Definition Target
Bias Mean(estimate - true) ≈ 0
RMSE √Mean((estimate - true)²) Small
Coverage P(true ∈ 90% CI) ≈ 0.90
CI Width Mean(upper - lower) Small but calibrated

0.4.1 Sensitivity Parameter (\(\alpha\))

Show code
# Extract alpha values
alpha_true = np.array([p['alpha'] for p in true_params_list])
alpha_mean = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries])
alpha_lower = np.array([s.loc['alpha', '5%'] for s in posterior_summaries])
alpha_upper = np.array([s.loc['alpha', '95%'] for s in posterior_summaries])

# Calculate metrics
alpha_bias = np.mean(alpha_mean - alpha_true)
alpha_rmse = np.sqrt(np.mean((alpha_mean - alpha_true)**2))
alpha_coverage = np.mean((alpha_true >= alpha_lower) & (alpha_true <= alpha_upper))
alpha_ci_width = np.mean(alpha_upper - alpha_lower)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# True vs Estimated
ax = axes[0]
ax.scatter(alpha_true, alpha_mean, alpha=0.7, s=60, c='steelblue', edgecolor='white')
lims = [min(alpha_true.min(), alpha_mean.min()) * 0.9, 
        max(alpha_true.max(), alpha_mean.max()) * 1.1]
ax.plot(lims, lims, 'r--', linewidth=2, label='Identity line')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.set_xlabel('True α', fontsize=12)
ax.set_ylabel('Estimated α (posterior mean)', fontsize=12)
ax.set_title(f'α Recovery: Bias={alpha_bias:.3f}, RMSE={alpha_rmse:.3f}', fontsize=12)
ax.legend()
ax.set_aspect('equal')

# Coverage plot
ax = axes[1]
iterations = np.arange(len(alpha_true))
for i in range(len(alpha_true)):
    covered = (alpha_true[i] >= alpha_lower[i]) & (alpha_true[i] <= alpha_upper[i])
    color = 'forestgreen' if covered else 'crimson'
    ax.plot([i, i], [alpha_lower[i], alpha_upper[i]], color=color, linewidth=2, alpha=0.7)
    ax.scatter(i, alpha_mean[i], color=color, s=40, zorder=3)

ax.scatter(iterations, alpha_true, color='black', s=60, marker='x', 
           label='True value', zorder=4, linewidth=2)
ax.set_xlabel('Iteration', fontsize=12)
ax.set_ylabel('α', fontsize=12)
ax.set_title(f'α: 90% Credible Intervals (Coverage = {alpha_coverage:.0%})', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nα Recovery Statistics:")
print(f"  Bias: {alpha_bias:.4f}")
print(f"  RMSE: {alpha_rmse:.4f}")
print(f"  90% CI Coverage: {alpha_coverage:.1%}")
print(f"  Mean CI Width: {alpha_ci_width:.3f}")
Figure 1: Recovery of the sensitivity parameter α. Left: True vs. estimated values with identity line. Right: 90% credible intervals for each iteration, colored by whether they contain the true value.

α Recovery Statistics:
  Bias: 0.1931
  RMSE: 1.1685
  90% CI Coverage: 90.0%
  Mean CI Width: 3.252

0.4.2 Feature-to-Probability Weights (\(\boldsymbol{\beta}\))

The β matrix has K × D = 15 parameters (3 consequences × 5 features). We examine recovery across all of them:

Show code
# Compute recovery metrics for all beta parameters
K, D = config['K'], config['D']
beta_recovery = []

for k in range(K):
    for d in range(D):
        param_name = f"beta[{k+1},{d+1}]"
        
        beta_true = np.array([p['beta'][k][d] for p in true_params_list])
        beta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries])
        beta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries])
        beta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries])
        
        bias = np.mean(beta_mean - beta_true)
        rmse = np.sqrt(np.mean((beta_mean - beta_true)**2))
        coverage = np.mean((beta_true >= beta_lower) & (beta_true <= beta_upper))
        ci_width = np.mean(beta_upper - beta_lower)
        
        beta_recovery.append({
            'parameter': param_name,
            'k': k + 1,
            'd': d + 1,
            'bias': bias,
            'rmse': rmse,
            'coverage': coverage,
            'ci_width': ci_width,
            'true': beta_true,
            'mean': beta_mean,
            'lower': beta_lower,
            'upper': beta_upper
        })

beta_df = pd.DataFrame(beta_recovery)
Show code
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# RMSE heatmap
ax = axes[0]
rmse_matrix = beta_df.pivot(index='k', columns='d', values='rmse')
im = ax.imshow(rmse_matrix.values, cmap='Blues', aspect='auto')
ax.set_xlabel('Feature Dimension (d)', fontsize=12)
ax.set_ylabel('Consequence (k)', fontsize=12)
ax.set_title('β RMSE by Position', fontsize=12)
ax.set_xticks(range(D))
ax.set_xticklabels([str(d+1) for d in range(D)])
ax.set_yticks(range(K))
ax.set_yticklabels([str(k+1) for k in range(K)])
for i in range(K):
    for j in range(D):
        ax.text(j, i, f'{rmse_matrix.values[i, j]:.2f}', ha='center', va='center', fontsize=10)
plt.colorbar(im, ax=ax, label='RMSE')

# Coverage bar plot
ax = axes[1]
coverage_values = beta_df['coverage'].values
x = np.arange(len(coverage_values))
colors = ['forestgreen' if c >= 0.85 else 'orange' if c >= 0.75 else 'crimson' 
          for c in coverage_values]
ax.bar(x, coverage_values, color=colors, alpha=0.7, edgecolor='white')
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('β Parameter Index', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('β 90% CI Coverage by Parameter', fontsize=12)
ax.set_ylim(0, 1.05)
ax.legend()

plt.tight_layout()
plt.show()

print(f"\nβ Recovery Summary:")
print(f"  Mean Bias: {beta_df['bias'].mean():.4f}")
print(f"  Mean RMSE: {beta_df['rmse'].mean():.4f}")
print(f"  Mean Coverage: {beta_df['coverage'].mean():.1%}")
print(f"  Mean CI Width: {beta_df['ci_width'].mean():.3f}")
Figure 2: Summary of β parameter recovery. Left: RMSE for each β coefficient. Right: Coverage rates (target = 90%, shown as dashed line).

β Recovery Summary:
  Mean Bias: -0.0075
  Mean RMSE: 0.9523
  Mean Coverage: 90.7%
  Mean CI Width: 3.154
Show code
fig, ax = plt.subplots(figsize=(8, 8))

# Pool all beta values
all_beta_true = np.concatenate([r['true'] for r in beta_recovery])
all_beta_mean = np.concatenate([r['mean'] for r in beta_recovery])

ax.scatter(all_beta_true, all_beta_mean, alpha=0.4, s=30, c='purple', edgecolor='none')
lims = [min(all_beta_true.min(), all_beta_mean.min()) * 1.1, 
        max(all_beta_true.max(), all_beta_mean.max()) * 1.1]
ax.plot(lims, lims, 'r--', linewidth=2, label='Identity line')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.set_xlabel('True β', fontsize=12)
ax.set_ylabel('Estimated β (posterior mean)', fontsize=12)
ax.set_title(f'β Recovery (all {K*D} parameters pooled)', fontsize=12)
ax.legend()
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)

# Add correlation
corr = np.corrcoef(all_beta_true, all_beta_mean)[0, 1]
ax.text(0.05, 0.95, f'r = {corr:.3f}', transform=ax.transAxes, fontsize=12,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()
Figure 3: True vs. estimated values for all β parameters pooled together. The identity line shows perfect recovery.

0.4.3 Utility Increments (\(\boldsymbol{\delta}\))

Show code
K_minus_1 = config['K'] - 1
fig, axes = plt.subplots(2, K_minus_1, figsize=(6 * K_minus_1, 10))
if K_minus_1 == 1:
    axes = axes.reshape(2, 1)

delta_stats = []

for k in range(K_minus_1):
    param_name = f"delta[{k+1}]"
    
    delta_true = np.array([p['delta'][k] for p in true_params_list])
    delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries])
    delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries])
    delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries])
    
    # Metrics
    bias = np.mean(delta_mean - delta_true)
    rmse = np.sqrt(np.mean((delta_mean - delta_true)**2))
    coverage = np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper))
    ci_width = np.mean(delta_upper - delta_lower)
    
    delta_stats.append({
        'parameter': f'δ_{k+1}',
        'bias': bias,
        'rmse': rmse,
        'coverage': coverage,
        'ci_width': ci_width
    })
    
    # True vs Estimated
    ax = axes[0, k]
    ax.scatter(delta_true, delta_mean, alpha=0.7, s=60, c='forestgreen', edgecolor='white')
    ax.plot([0, 1], [0, 1], 'r--', linewidth=2)
    ax.set_xlim(-0.05, 1.05)
    ax.set_ylim(-0.05, 1.05)
    ax.set_xlabel(f'True δ_{k+1}', fontsize=11)
    ax.set_ylabel(f'Estimated δ_{k+1}', fontsize=11)
    ax.set_title(f'δ_{k+1}: Bias={bias:.3f}, RMSE={rmse:.3f}', fontsize=11)
    ax.set_aspect('equal')
    
    # Coverage
    ax = axes[1, k]
    iterations = np.arange(len(delta_true))
    for i in range(len(delta_true)):
        covered = (delta_true[i] >= delta_lower[i]) & (delta_true[i] <= delta_upper[i])
        color = 'forestgreen' if covered else 'crimson'
        ax.plot([i, i], [delta_lower[i], delta_upper[i]], color=color, linewidth=2, alpha=0.7)
        ax.scatter(i, delta_mean[i], color=color, s=40, zorder=3)
    ax.scatter(iterations, delta_true, color='black', s=60, marker='x', 
               label='True value', zorder=4, linewidth=2)
    ax.set_xlabel('Iteration', fontsize=11)
    ax.set_ylabel(f'δ_{k+1}', fontsize=11)
    ax.set_title(f'δ_{k+1}: Coverage = {coverage:.0%}', fontsize=11)
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

delta_df = pd.DataFrame(delta_stats)
print(f"\nδ Recovery Summary:")
for _, row in delta_df.iterrows():
    print(f"  {row['parameter']}: Bias={row['bias']:.4f}, RMSE={row['rmse']:.4f}, Coverage={row['coverage']:.0%}")
Figure 4: Recovery of utility increment parameters δ. Each panel shows true vs. estimated values and coverage for one δ component.

δ Recovery Summary:
  δ_1: Bias=0.0497, RMSE=0.3048, Coverage=86%
  δ_2: Bias=-0.0497, RMSE=0.3048, Coverage=86%

0.5 Summary Table

Show code
# Create comprehensive summary table
summary_rows = []

# Alpha
summary_rows.append({
    'Parameter': 'α',
    'Bias': alpha_bias,
    'RMSE': alpha_rmse,
    'Coverage': alpha_coverage,
    'CI Width': alpha_ci_width,
    'Notes': 'Sensitivity'
})

# Beta (aggregated)
summary_rows.append({
    'Parameter': 'β (all)',
    'Bias': beta_df['bias'].mean(),
    'RMSE': beta_df['rmse'].mean(),
    'Coverage': beta_df['coverage'].mean(),
    'CI Width': beta_df['ci_width'].mean(),
    'Notes': f'{K}×{D} parameters'
})

# Delta (individual)
for _, row in delta_df.iterrows():
    summary_rows.append({
        'Parameter': row['parameter'],
        'Bias': row['bias'],
        'RMSE': row['rmse'],
        'Coverage': row['coverage'],
        'CI Width': row['ci_width'],
        'Notes': 'Utility increment'
    })

summary_df = pd.DataFrame(summary_rows)

# Format for display
display_df = summary_df.copy()
display_df['Bias'] = display_df['Bias'].apply(lambda x: f'{x:.4f}')
display_df['RMSE'] = display_df['RMSE'].apply(lambda x: f'{x:.4f}')
display_df['Coverage'] = display_df['Coverage'].apply(lambda x: f'{x:.0%}')
display_df['CI Width'] = display_df['CI Width'].apply(lambda x: f'{x:.3f}')

print(display_df.to_string(index=False))
Table 1: Parameter recovery statistics for all parameters in model m_0.
Parameter    Bias   RMSE Coverage CI Width             Notes
        α  0.1931 1.1685      90%    3.252       Sensitivity
  β (all) -0.0075 0.9523      91%    3.154    3×5 parameters
      δ_1  0.0497 0.3048      86%    0.892 Utility increment
      δ_2 -0.0497 0.3048      86%    0.892 Utility increment

0.6 Discussion

0.6.1 Differential Recovery Across Parameters

NoteKey Finding: Differential Parameter Recovery

The results above reveal an asymmetry in parameter recovery:

  • α (sensitivity) is recovered well, with posterior means clustering tightly around true values
  • β (feature weights) and δ (utility increments) show significantly weaker recovery, with wider credible intervals and lower precision compared to α

This pattern reflects the identification structure analyzed formally in Report 5: while α is globally identified, β and δ are only weakly identified through their composite mapping to expected utilities.

Importantly for applications focused on estimating sensitivity (α): The weaker identification of β and δ does not substantially impair α recovery. As the figures above show, α is well-recovered even when β and δ remain uncertain. If the primary research question concerns sensitivity to expected utility differences rather than the decomposition of expected utility into beliefs and values, the identification challenges for (β, δ) may be of secondary concern.

To understand why β and δ are harder to recover than α, consider the structure of the model. In decisions under uncertainty, choices depend on expected utilities:

\[ \eta_r = \sum_{k=1}^K \psi_{rk} \upsilon_k \]

where \(\boldsymbol{\psi}_r\) are subjective probabilities (determined by β) and \(\boldsymbol{\upsilon}\) are utilities (determined by δ). The key insight is that we only observe rankings of expected utilities through choices—not the utilities themselves, and not the separate contributions of beliefs and values.

This creates a multiplicative coupling between β and δ: different (β, δ) pairs can produce similar expected utility functions, making both parameters harder to pin down precisely:

  • α directly governs how sensitively choices respond to expected utility differences—it is well-identified because choice probabilities depend monotonically on α for any fixed expected utility difference
  • β and δ jointly determine expected utilities through their product; uncertainty in one propagates to the other

Note that this does not mean β and δ are completely uninformed by the data—the posteriors do concentrate relative to the prior, as the figures above show—but information accumulates more slowly than for α.

0.6.2 Does Increasing Sample Size Help?

A natural question is whether the weaker δ recovery is simply a matter of insufficient data. (Note: β exhibits similar patterns to δ due to their structural coupling, so we focus on δ as representative of the utility-function parameters.) The study design with M=25 decision problems might not provide enough information. Let’s test this by doubling the sample size to M=50.

To ensure a valid comparison, we use the extend() method to create the larger design from the original M=25 design. This keeps the same alternatives (features) and original problems, adding 25 new problems that use those same alternatives. Any differences in recovery can then be attributed to sample size rather than different feature configurations.

Show code
# Extend the original study design to M=50
# This preserves the same alternatives and original problems
study_m50 = study.extend(additional_M=25, design_name="parameter_recovery_m50")

print(f"Extended Study Design (M=50):")
print(f"  Decision problems: {study_m50.M} (original {study.M} + 25 new)")
print(f"  Same R={study_m50.R} alternatives with identical features")
print(f"  Total expected observations: ~{study_m50.M * 3.5:.0f} choices")
Extended Study Design (M=50):
  Decision problems: 50 (original 25 + 25 new)
  Same R=15 alternatives with identical features
  Total expected observations: ~175 choices
Show code
# Create output directory
output_dir_m50 = tempfile.mkdtemp(prefix="param_recovery_m50_")

# Run parameter recovery with larger sample
recovery_m50 = ParameterRecovery(
    inference_model_path=None,
    sim_model_path=None,
    study_design=study_m50,
    output_dir=output_dir_m50,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)

true_params_m50, posterior_summaries_m50 = recovery_m50.run()
Show code
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# RMSE comparison
ax = axes[0]
params = ['α', 'δ₁', 'δ₂']
rmse_m25 = [alpha_rmse, delta_df.iloc[0]['rmse'], delta_df.iloc[1]['rmse']]
rmse_m50 = [alpha_rmse_m50, delta_df_m50.iloc[0]['rmse'], delta_df_m50.iloc[1]['rmse']]

x = np.arange(len(params))
width = 0.35
bars1 = ax.bar(x - width/2, rmse_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, rmse_m50, width, label='M=50', color='coral', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('RMSE by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# Coverage comparison
ax = axes[1]
cov_m25 = [alpha_coverage, delta_df.iloc[0]['coverage'], delta_df.iloc[1]['coverage']]
cov_m50 = [alpha_coverage_m50, delta_df_m50.iloc[0]['coverage'], delta_df_m50.iloc[1]['coverage']]

bars1 = ax.bar(x - width/2, cov_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, cov_m50, width, label='M=50', color='coral', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('90% CI Coverage by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

# CI Width comparison  
ax = axes[2]
ci_m25 = [alpha_ci_width, delta_df.iloc[0]['ci_width'], delta_df.iloc[1]['ci_width']]
ci_m50 = [alpha_ci_width_m50, delta_df_m50.iloc[0]['ci_width'], delta_df_m50.iloc[1]['ci_width']]

bars1 = ax.bar(x - width/2, ci_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, ci_m50, width, label='M=50', color='coral', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('90% CI Width by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()
Figure 5: Comparison of parameter recovery between M=25 and M=50. Doubling the number of decision problems improves α recovery but has a more modest effect on δ recovery. (β exhibits similar patterns to δ due to their structural coupling.)
Table 2
Sample Size Comparison (M=25 vs M=50):
Parameter RMSE (M=25) RMSE (M=50) Δ RMSE CI Width (M=25) CI Width (M=50)
        α      1.1685      0.7961 -31.9%           3.252           2.535
      δ_1      0.3048      0.2955  -3.1%           0.892           0.879
      δ_2      0.3048      0.2955  -3.1%           0.892           0.879

0.6.3 Does Adding New Alternatives Help?

The M=50 extension above adds problems over the same alternatives—it doesn’t expand the feature space. But we might hypothesize that adding new alternatives (with new feature vectors) provides qualitatively different information that could help with δ recovery. (As with the previous section, β exhibits similar patterns to δ.)

To test this, we use extend_with_alternatives() to add 15 new alternatives along with 25 new problems that use those alternatives. This parallels how model m_1 (in Report 5) adds new risky alternatives along with new risky problems, enabling a fairer comparison.

Show code
# Extend with both new alternatives AND new problems
# New problems only use the new alternatives (paralleling m_1's risky component)
study_m50_new_alts = study.extend_with_alternatives(
    additional_M=25,
    additional_R=15,  # Same number of new alternatives as m_1 has risky alternatives
    design_name="parameter_recovery_m50_new_alts",
    new_problems_use_new_alts_only=True
)

print(f"Extended Study Design (M=50 with new alternatives):")
print(f"  Decision problems: {study_m50_new_alts.M} (original {study.M} + 25 new)")
print(f"  Alternatives: R={study_m50_new_alts.R} (original {study.R} + 15 new)")
print(f"  Original problems (1-25) use original alternatives (1-15)")
print(f"  New problems (26-50) use new alternatives (16-30)")
Extended Study Design (M=50 with new alternatives):
  Decision problems: 50 (original 25 + 25 new)
  Alternatives: R=30 (original 15 + 15 new)
  Original problems (1-25) use original alternatives (1-15)
  New problems (26-50) use new alternatives (16-30)
Show code
# Create output directory
output_dir_m50_new_alts = tempfile.mkdtemp(prefix="param_recovery_m50_new_alts_")

# Run parameter recovery with new alternatives
recovery_m50_new_alts = ParameterRecovery(
    inference_model_path=None,
    sim_model_path=None,
    study_design=study_m50_new_alts,
    output_dir=output_dir_m50_new_alts,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)

true_params_m50_new_alts, posterior_summaries_m50_new_alts = recovery_m50_new_alts.run()
Show code
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

params = ['δ₁', 'δ₂']
x = np.arange(len(params))
width = 0.25

# RMSE comparison
ax = axes[0]
rmse_m25_d = [delta_df.iloc[k]['rmse'] for k in range(K_minus_1)]
rmse_m50_d = [delta_df_m50.iloc[k]['rmse'] for k in range(K_minus_1)]
rmse_m50_new_d = [delta_stats_m50_new[k]['rmse'] for k in range(K_minus_1)]

bars1 = ax.bar(x - width, rmse_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, rmse_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, rmse_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('δ RMSE by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='y')

# Coverage comparison
ax = axes[1]
cov_m25_d = [delta_df.iloc[k]['coverage'] for k in range(K_minus_1)]
cov_m50_d = [delta_df_m50.iloc[k]['coverage'] for k in range(K_minus_1)]
cov_m50_new_d = [delta_stats_m50_new[k]['coverage'] for k in range(K_minus_1)]

bars1 = ax.bar(x - width, cov_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, cov_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, cov_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('δ 90% CI Coverage by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3, axis='y')

# CI Width comparison  
ax = axes[2]
ci_m25_d = [delta_df.iloc[k]['ci_width'] for k in range(K_minus_1)]
ci_m50_d = [delta_df_m50.iloc[k]['ci_width'] for k in range(K_minus_1)]
ci_m50_new_d = [delta_stats_m50_new[k]['ci_width'] for k in range(K_minus_1)]

bars1 = ax.bar(x - width, ci_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, ci_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, ci_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('δ 90% CI Width by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()
Figure 6: Three-way comparison of δ recovery: M=25 (base), M=50 (same alternatives), and M=50 (new alternatives). Adding new alternatives may provide different information than simply adding more problems over existing alternatives. (β exhibits similar patterns to δ due to their structural coupling.)
Table 3
Three-Way Extension Comparison:
  Param    RMSE           RMSE          RMSE    CI Width       CI Width      CI Width
           M=25    M=50 (same)    M=50 (new)        M=25    M=50 (same)    M=50 (new)
-------  ------  -------------  ------------  ----------  -------------  ------------
      α  1.1685         0.7961        0.7062       3.252          2.535         2.374
    δ_1  0.3048         0.2955        0.287        0.892          0.879         0.877
    δ_2  0.3048         0.2955        0.287        0.892          0.879         0.877

0.6.4 Interpretation: Sample Size and Alternative Effects

The three-way comparison reveals important patterns about what helps δ recovery (β exhibits similar patterns due to the structural coupling between these parameters):

NoteExtension Type Matters

Comparing the three conditions:

  1. M=25 (base): Baseline recovery with modest δ precision
  2. M=50 (same alternatives): Adding 25 problems over the same R=15 alternatives
  3. M=50 (new alternatives): Adding 25 problems using 15 new alternatives (R increases to 30)

Key findings: - α recovery improves with more data regardless of extension type - δ recovery may benefit differently from new alternatives vs. more problems over existing alternatives - Adding new feature vectors expands the feature space, potentially providing different information about how features map to subjective probabilities

The comparison between conditions 2 and 3 is particularly important for understanding what helps δ recovery. If new alternatives (condition 3) outperform same alternatives (condition 2), it suggests that expanding the feature space provides qualitatively different information.

This three-way comparison sets up a crucial comparison for Report 5, where we add risky alternatives (lotteries with known probabilities) instead of additional uncertain alternatives. By comparing:

  • M=50 with new uncertain alternatives (this report): Expands uncertain feature space
  • M=25 + N=25 with new risky alternatives (Report 5): Adds known-probability lotteries

…we can isolate whether improvements come from the type of data (risk vs. uncertainty) or simply from adding more diverse choice problems.

The asymmetry in learning rates for (β, δ) versus α reflects the model structure: in decisions under uncertainty, utilities and subjective probabilities enter the likelihood through their product (expected utilities). This creates a multiplicative coupling that makes (β, δ) harder to pin down than α, which more directly governs choice sensitivity.

0.6.5 Study Design Parameters in m_0

The m_0.stan data block defines several parameters that characterize study size:

Parameter Description Current Value
M Number of decision problems 25 → 50
K Number of consequences 3
D Feature dimensions 5
R Distinct alternatives 15

We might consider varying other parameters:

  • Increasing K (more consequences): Would require estimating more δ parameters, potentially making recovery harder
  • Increasing R (more alternatives): Provides more information about β, and may indirectly help δ through better-constrained expected utilities
  • Increasing D (more features): Similar to increasing R—primarily helps with β

The effectiveness of these changes for (β, δ) recovery is an empirical question worth exploring in future work.

0.6.6 An Alternative Approach: Adding Risky Choice Data

Decision theory offers a principled alternative to simply increasing sample size: adding risky choice data where probabilities are known to the decision-maker. This approach, motivated by the Anscombe-Aumann framework, provides a different type of information that may help constrain utility parameters more directly.

In risky decisions, the probabilities are fixed by the experimenter, so choices directly reveal information about utilities without the partial confounding with subjective probabilities that occurs in decisions under uncertainty.

NotePreview of Model m_1

Model m_1 extends m_0 by adding N risky choice problems where:

  • Alternatives are lotteries with experimenter-specified probabilities
  • The same utility function υ (and hence δ) governs both choice types
  • Risky choices provide more direct information about utility structure

This provides an alternative route to improving (β, δ) recovery that may be more efficient than simply increasing M. See Report 5 for the full development and comparison.

0.7 Conclusion

Parameter recovery analysis for model m_0 reveals differential recovery across parameters:

  1. Well-recovered parameter: α (sensitivity) can be recovered with good precision
  2. Slower-learning parameters: β (feature weights) and δ (utility increments) show wider uncertainty, though they do respond to data—posteriors concentrate relative to priors, but more slowly than for α
  3. Sample size helps, but differentially: Doubling M improves α recovery more than (β, δ) recovery

This pattern reflects the identification structure of the SEU model analyzed formally in Report 5: while α is globally identified, β and δ interact multiplicatively in the expected utility, causing uncertainty in one to propagate to the other.

Implications for applications:

  • If the primary research question concerns sensitivity (α)—how strongly choices respond to expected utility differences—model m_0 performs well even with modest sample sizes
  • If precise decomposition of expected utility into beliefs (β) and values (δ) is needed, either larger samples or the addition of risky choice data (model m_1) will be required

Two paths forward for improving (β, δ) recovery:

  1. More data: Larger M (and perhaps R) may eventually yield precise (β, δ) estimates, though the rate of improvement is slower than for α
  2. Different data: Adding risky choices (model m_1) provides a complementary data source where probabilities are known, breaking the multiplicative coupling and constraining utilities more directly

The choice between these approaches—or their combination—depends on practical considerations like study feasibility and the relative importance of precise utility estimation for the research question at hand.

Reuse

Citation

BibTeX citation:
@online{helzner2026,
  author = {Helzner, Jeff},
  title = {Parameter {Recovery} {Analysis}},
  date = {2026-02-15},
  url = {https://jeffhelzner.github.io/seu-sensitivity/foundations/04_parameter_recovery.html},
  langid = {en}
}
For attribution, please cite this work as:
Helzner, Jeff. 2026. “Parameter Recovery Analysis.” SEU Sensitivity Project. February 15, 2026. https://jeffhelzner.github.io/seu-sensitivity/foundations/04_parameter_recovery.html.